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Abstract 

Gibbs random fields play an important role in statistics. However they are complicated 
to work with due to an intractability of the likelihood function and there has been 
much work devoted to finding computational algorithms to allow Bayesian inference to 
be conducted for such so-called doubly intractable distributions. This paper extends 
this work and addresses the issue of estimating the evidence and Bayes factor for 
such models. The approach which we develop is shown to yield good performance. 
Supplemental material for this article are available online. 
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1 Introduction 



Gibbs random fields find much use in statistics, for example, the widely used autologistic 



model (Besag 1974) in spatial statistics, and the exponential random graph model in so- 



cial network analysis (Robins et al 2007). The popularity of this class of statistical model 



results from the fact that it is possible to build a joint model with complicated global depen- 
dencies based on local dependencies. Despite their wide use, Gibbs random fields present 
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considerable difficulties from the point of view of parameter estimation, because the likeli- 
hood function is typically intractable for all but trivially small graphs. One of the earliest 



approaches to overcome this difficulty is the pseudolikelihood method (Besag 1972), which 
approximates the joint likelihood function by the product of full-conditional distributions of 
all nodes, and so ignores dependencies beyond ffist order. Recently there has been much 
progress in the literature addressing this problem, based on the fact that it is often possible 
to simulate from the likelihood function, for example, the single auxiliary variable method 



(M0ller et al 2006), the exchange algorithm (Murray et al 2006) and approximate Bayesian 



computation model choice (Grelaud et al 2009), to name a few 



Posterior parameter estimation for Gibbs random fields has been termed a doubly in- 
tractable problem because both the likelihood function and also the posterior distribution are 
intractable. This paper tackles an additional layer of intractability by focusing on estimating 
the statistical evidence, that is, the integral of the un-normalised posterior distribution over 
the parameters of the Gibbs random field and could be termed a triply intractable problem. 

This paper is organised as follows. Section [2] introduces Gibbs random fields and outlines 
the issue of model choice for this class of statistical model. The exchange algorithm upon 
which the innovation in this paper is based is explained in Section |3| Section |4] describes 
how the exchange algorithm can be extended to yield an estimate of the statistical evidence 
and the Bayes factor. Approximate Bayesian computation in the context of model choice for 
Gibbs random fields is explained in Section [5| The performance of the Bayes factor estima- 
tion algorithm is illustrated in Section |6] using models from spatial statistics and statistical 
network analysis. Finally a summary and future directions are presented in Section [7j 

2 Discrete- valued Markov random fields 

Discrete Markov random fields play an important role in several areas of statistics including 
spatial statistics and social network analysis. The autologistic model, popularised by Besag 



(1972) which has the Ising model as a special case, is widely used in the analysis of binary 



spatial data defined on a lattice. The exponential random graph (or p*) model is frequently 



used to model relational network data. See (Robins et al 2007) for an excellent introduction 
to this body of work. 

Despite their popularity, discrete valued Markov random fields (or Gibbs distributions) 
are very problematic to work with from a inferential point of view. We will explain why 
below, and first begin with some preliminaries. Let y = {yi, . . . ^y^} denote realised data 
defined on a set of nodes {!,... ,iV} of a graph, where each observed value yi takes values 
from some finite state space. The likelihood of y given a vector of parameters 6 = (6*1, ... , Om) 
is defined as 

f{y\e)^eMO^s{y)):=q{y\e), (1) 

where s{y) = {si{y), . . . , Sm{y)) is a vector of statistics which are sufficient for the likelihood. 
The constant of proportionality in ([T]), 

zi9) = J2eM0^s{y)), 

y&Y 

depends on the parameters 6, and is a summation over all possible realisation of the Gibbs 
random field. Clearly, z{6) is intractable for all but trivially small situations. 

One of the earliest approaches to overcome the intractability of ([T]) is the pseudolikelihood 



method (Besag 1972) which approximates the joint distribution of y as the product of full- 
conditional distributions for each y^, 

n 

fpseudoiy) = Ylf{yi\y-i,0), 

i=l 

where y-i denotes y\{yi}- This approximation has been shown to lead to unreliable estimates 



oie, for example, ( |Ryden and Titterington 1998D , ( |Friel and Pettitt 2004] ), ( |Friel et al 2009| ). 
Monte Carlo approaches have also been exploited to estimate the intractable likelihood, for 



example the Monte Carlo maximum likelihood estimator of Geyer and Thompson (1992). 
More recently, auxiliary variable approaches have been presented to tackle this problem 



through the single auxiliary variable method (M0ller et al 2006) and the exchange algorithm 



(Murray et al 2006). 
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2.1 Evidence estimation for MRFs is a triply intractable problem 

This paper explores the issue of estimation of the statistical evidence (or marginal likelihood) 

<y) = [ f{y\eMe)de. 



The evidence can then be used to allow one to make probability statements about the model 
itself, which in practical terms amounts to choosing which sufficient statistics to include 
in ([T]). Evaluating the statistical evidence is generally an intractable calculation and one 
typically resorts to some form of Monte Carlo simulation in order to yield an estimate of it. 



For example, the tempered transition algorithm (Neal 1996), annealed importance sampling 



(Neal 2001), bridge sampling (Meng and Wong 1996) and power posteriors (Friel and Pettitt 



2008), which is based on path sampling (Gelman and Meng 1998), all rely on the use of 



a tempering or bridging scheme, typically transitioning from prior to posterior in order 
to estimate the evidence. For example, the power posterior method explores a tempered 
posterior distribution of the form 

n{9\y,t) ^ fiyieyirie), t G [0, 1], (2) 

so that n{9\y,t = 0) and n{9\y,t = 1) correspond to the prior and posterior distribution, 
respectively. The evidence results from a discretised version of the identity 

n{y)= f\e\y,tf{y\e)dt. (3) 

This article also uses tempering in a similar manner, however none of the approaches de- 
scribed above can be directly applied to posterior distributions with intractable likelihoods, 
since each method above (including equations (|2]) and (|3|) assumes that the likelihood func- 
tion f{y\6), can be evaluated for any 6. This is one of the reasons why estimating the 
evidence for doubly intractable distributions is a very challenging task, and is one which 
might be termed a triply intractable problem. This therefore motivates the development of 
an new strategy to overcome this intractability. The approach which we develop is based on 
the exchange algorithm which we now outline. 
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3 The exchange algorithm 



Murray et al (2006) presented an algorithm to allow inference for doubly intractable distri- 



butions extending that presented in (M0ller et al 2006). This algorithm samples from an 
augmented distribution 

Ti{e\y\9\y) ^ f{y\9)Ti{e)h{e'\9)f{y'\e') 

whose marginal distribution for 9 is the posterior of interest. Here f{y'\9') is the same likeli- 
hood model on which y is defined and h{9'\9) is an arbitrary distribution for the augmented 
variable 9' which might depend on 9, for example, a random walk distribution centred at 9. 
The exchange algorithm is described in detail below, where the output of the Markov chain 
at iteration i is denoted by (^*^*\ y^*-''). 

Algorithm 1: Exchange algorithm 

1 Initialise ^(o)'^ ^(o)'. 

2 for i = 1, ... ,1 do 

3 9' ^ h{-\9^'-^^)- 

4 y' ~ f{y\9'y, 

5 Propose to swap/exchange 9' — t- 9^^^; 9^'^^^^ — ?■ 9^^^';y' — y*^*)' and with probability 

V ' fiy'\9') f{y\9^'-'^) n{9i^^^))h{9'\9(-^-^))J ^ ' 

accept this move; 

Otherwise set 9^'-^'^ ^ ^ 

7 end 



Taking a closer look at the acceptance ratio in the exchange algorithm Q, and assuming 
that h is symmetric yields the expression 



a = mm 



ni9') q{y\9') qiy'\9^^^) \ 
' 7r(^«) q{y\9(^)) q{y'\9') J' ^ ^ 
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If one were to apply a naive Metropolis-Hastings algorithm to sample from the target 
f{y\9)'K{9), by using a symmetric proposal to move from 9 to 9', this would yield the accep- 
tance ratio 

. / n{9')qeiy) z{9)\ 
a = mm l, —————— 6 

V 7r(e) qe{y) z[9') J 

which is intractable, due to the ratio z{9)/z{9'). Note that the intractable ratio z{9)/z(9') 
in ([5]) is replaced in ^ by the ratio qe{y') /Qe'iv') which can be interpreted as importance 



sampling estimate of z{9)/z{9') as pointed out in (Murray et al 2006). 

A practical difficulty in implementing the exchange algorithm is the requirement to draw 
an exact sample y' ~ f{-\9'). Perfect sampling is an obvious approach. A pragmatic alter- 
native is to take a realisation from a long MCMC run with stationary distribution f{y'\9') 
as an approximate draw and this is the approach taken by (Caimo and Friel 2011), ( [Cucala 



et al 2009), (Friel and Pettitt 2011), and a theoretical justification for this approximation is 



given by (Everitt 2012). 



4 A population MCMC extension of the exchange al- 
gorithm 

Here we present a population-based MCMC extension of the exchange algorithm which will 
lead to realisations from the posterior distribution p{9\y). We will then illustrate how this 
algorithm can be modified to also allow an unbiased estimate of the normalising constant 
of the likelihood, z{9*)^ for each draw 9* from the posterior distribution. Both of these will 
allow us to estimate the evidence 7r(y). 

A population MCMC approach often involves constructing an augmented target distri- 
bution 

7rio(^o|y) X ■■■ X TltA^n\y), 

where 

T,tAOi\y)^f{y\Oi)'"^m 



6 



and where = to < < " " " < ^n-i < tn = I- The rationale for constructing such an 
augmented target distribution is that this facilitates the states of the population (6^0, . . . , On) 
to interact with one another to slowly move from the prior, tt^q to the posterior, TTt,^. Here 
we extend this framework, by further augmenting the distribution of each chain in the 
population. We define 

7rto(^0,^0,l/01,- • • ,yoJy) X ••• X 7rt„{0n,0n,ynl,---,yns\y) (7) 

where 

and the marginal distribution for 6 at temperature t„ = 1 is the target distribution of interest. 
The population exchange algorithm is described in detail in Algorithm |2] 
A key stage in this algorithm is how the chains interact with one another, and of course 

this is much choice in how the function h{6'j\6Q \ . . . ,6^^l-^^,6^^~^^) is designed. A simple 

approach is to center this proposal on the average of parameters from neighbouring chains, 

for example, 

Obviously more elaborate proposals, such as those based on the differential evolutionary 



Monte Carlo approach of (ter Braak and Vrugt 2008), may also be employed. 



Note that a tempering scheme has also been proposed in Murray et al (2006). There the 



purpose was to show how the tempered transitions algorithm (Neal 1996) could be adopted 
to allow one to efficiently transition to a new parameter by proposing a sequence of inter- 
mediate parameters which connect the current parameters to the proposed parameters. In 
our set up, the rationale for augmenting the target distribution with a sequence of tempered 
distribution is also to facilitate efficient mixing through the state space, but additionally 
to allow estimation of the normalisation constant of the likelihood function, which we now 
explain. 
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Algorithm 2: Population exchange algorithm 



1 Initiahse (0^, ^^'^ vi^J'), • • • , (^r, , , l/^'^/ ) 



(0) nW „,(0)' „,{0)\ 



2 for i = 1, . . . , J do 

for j = 0, . . . ,n do 

Draw y;,. ..,?/; ~ f{y\tj0'j); 
Propose to swap/exchange 

9'^ ^ ; 9f-'^ ^ ^^f; (yl, (yj?', . . . ,i/]f ) and with probability 



a = mm 



accept this move ; 

Otherwise set ^ »f ; »f' '; (j/j*-"', 



end 



j=o \ k=i 'iyyjk rj^j ) 



X ^(0) 



10 end 



4.1 Estimating the normalising constant z{9) 

The population exchange algorithm outlines how to generate draws from the stationary 
distribution in ([T]). Here we will illustrate how this algorithm can be modified to yield 
an estimator of the normalising constant z{9) for a draw 9 from the target distribution at 
temperature t„ = 1. 

At iteration i, when visiting chain j, line 5 of Algorithm [2] yields a draw from the likeli- 
hood, 

y',- f{y\9'^r^ =fW^). 
Assume for the moment that the auxiliary parameter value 9'^ lead to a swap/exchange move 
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that was accepted, so that 6'j = 6'^ We view the algebraic part, q{y\tj9j ) of f{y\tj6- ) 
as an importance distribution for 'target' f{y\tj+id^j\i)- The ratio of the un-normalised 
importance function and un-normalised target gives an unbiased estimate of the ratio of 
their corresponding normalising constants: 

This can be improved by taking additional draws yji, . . . , yjs ~ f{y\tjOj'^) at line 5 of Algo- 
rithm [2| giving an unbiased estimate of z(tj+i6'j*_^]^)/z(tj6'j*''). 

Denoting the current state of the population at iteration i to be 

the auxiliary draws can be used, via importance sampling to yield an estimate of 

m z{t,ef) ziuef)'''"'' z{K.,etS 

Here each ratio on the right hand side of ([9]) is estimated by (|8]). Note that in the case of 
an Ising model, z{Q) = 2^, where, as before N is the size of the lattice. 

This implies that the parameter value, 6j'\ of chain j at iteration i, has an associated 
set of draws yji, . . . , yjs, for i = 1, . . . , s, corresponding to additional auxiliary draws from 
f{y\tj6^''^). In terms of storage, it suffices to store the low dimension sufficient statistics, 
s{yji), . . .,s{yjs). 

4.2 Estimating the evidence 

The developments in this paper have shown that 

1. the population exchange algorithm yields draws, {O^"^^} from the posterior by transi- 
tioning from the prior; 
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2. Additional auxiliary draws, at each iteration, can be used to give an estimate of z{6^'^^). 

This output can be used to estimate the evidence in a very straightforward manner. Re- 
writing Bayes' theorem, it holds for any 6* that, 

_ f{y\e*)n{e*) _ q{y\e*)n{e*) 

n{9*\y) z{9*)n{9*\y)' 

We can approximate this by 

. , q(y\e*)7i(e*) , , 

where 7r(6'*|?/) and z{Q*^ are estimates of 'n[Q*\y^ and z[Q*^ using the output from the pop- 
ulation exchange algorithm. Since in most applications Q is low dimensional (usually less 
than 5), in such situations it should be feasible to use kernel density estimation to estimate 
'n(Q*\y\ Furthermore, since we have estimates of the normalising constant at every draw Q 



from 'K{Q\y\ it makes sense to estimate T^eiy) in (10) for a range of high posterior Q draws 
and take an average of these estimates. For example, we consider the following estimate of 
the evidence 



6=1 

where 6i, . . . ,6r correspond to draws from a high posterior density region. 



4.3 Extending the methodology to estimate Bayes factors 

This methodology can be extended to more complications situations where there may be 
many parameters within each model, as is the case with the exponential random graph 
model which is used in social network analysis. It is also possible to construct a MCMC 
framework which can be used to estimate Bayes factors directly, rather than estimate the 
evidence for each pair of models separately. We outline below how the methods explored 
in this paper may be applied to estimate the Bayes factor between 2 competing (nested) 
models 

Tr{9i\y,mi) and TT{9i,92\y,m2). 
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Consider a distribution which is a geometric average of the un-normahsed posterior distri- 
butions for models 1 and 2: 

vr*(^i, e2\y) = {f{y\e,)n{e,)}'-' {f{y\e^, ^2)vr(^i, 62)}' , (12) 

for < t < 1. Suppose, for ease of exposition, that model mi has an associated parameter 61 
and corresponding sufficient statistic si{y), while model m2 also involves 9i and si{y), and 
an additional parameter 62 with sufficient statistics S2{y)- In the context of discrete Markov 



random fields (12) can be expressed as 



Z{Ui, W2) 

where q{y\0i,t62) = e^p {diSi{y) + t62S2{y)} and its normalising constant is written as 



z(9i,t62). Similar to Section HI augmenting (12) with an auxiliary data y' and parameter 6' 



as follows, will allow an MCMC kernel based on the exchange algorithm to apply. 



71^(01, 02, y', 0[, 0'2\y) = {f{y\0iM0i)fiy'\0[)V~' {f{y\0i,02)7r{0u 02)f{y'K ^^2)}* h{0[,0',\0i 

_ qiy\0i,t02) ^y^Will^M^' 0'\0 0\ 

- Z{0„t02) ^^^^^ "(^^'^^^ Z{0[,t0',) ^(^1' ^21^1,^2). 

In an entirely similar manner to Section |4] a population MCMC strategy can be used, which 
in this situation will yield at the end of each full sweep: 

1. A draw 01 from TT{0i\y,mi), 

2. a draw {0l,0l) from 71(01, 0l\y, 1712), 

3. an estimate of 2; (6^1, 6^2) /2;(^i). 

The output can be used to estimate: 

Tc{y\mi) 
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7i[y\m2) 

f{y\0lmi)7r{0l\rm) 7r(gl, |^, m^) 

H0l\y,mi) f{y\0l0lm2)7r{0l0l\m2) 



qiy\0l,mi)7:{0l 


mi) z{0l0l)7,{0l0l\y,m2) 


z{0i)7r{0l 


y.mi) q(y\0l0lm2)T^{0l0l 


1712) 



for any 01 and {01,01). 
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5 Approximate Bayesian Computation methods 



The exchange algorithm and its population-based variant rely on repeated drawing reali- 
sations from the likelihood. This is also the basis for approximate Bayesian computation 
(ABC), where the goal is also to allow one to make inference for posterior distributions 
with intractable likelihoods. There has been considerable work in this area beginning with 



(Pritchard et al 1999), (Beaumont et al 2002). An excellent review of the main develop- 



ments in ABC is presented in (Marin et al 2012). It is important to note that the class 
of statistical models examined in this paper, Gibbs random fields, are defined in terms of 
sufficient statistics, and therefore the issue of choice of summary statistics with which to 
compare pseudo-data and observed data, and which most ABC algorithms are faced with. 



does not arise here. Didelot et al. (2011 ) present an approach to estimate the evidence in the 
situation where one needs to choose appropriate summary statistics. In the context of model 



choice for Gibbs random fields, (Grelaud et al 2009) present an ABC algorithm to allow one 
to performance inference across the joint model and parameter space. Specifically they make 
the observation that the vector S{y) = {Si{y), . . . , S'a/ (y)) of sufficient statistics for each of 
M competing Gibbs random field models, with model prior vr(m), for m = 1, . . . , M, is itself 
sufficient for the joint distribution over model indices and parameters. In turn this leads to 
the ABC model choice algorithm described in Algorithm |3} 



6 Examples 
6.1 Ising model 

The Ising model, is defined on a regular lattice of size m x m', where n = mm'. It is used to 
model the spatial distribution of binary variables, taking values —1, 1. It is defined in terms 
of a sufficient statistic, 

n 

j=i i^j 
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Algorithm 3: ABC Model Choice 



1 Initialise 6'(°); 



2 for i 



1,. . . , J do 



4 



7 



3 



5 



6 




8 end 



where the notation i ^ j means that the lattice point i is a neighbour of lattice point 
j. Henceforth we assume that the lattice points have been indexed from top to bottom 
in each column and where columns are ordered from left to right. For example, for a first 
order neighbourhood model where an interior point has neighbours {yi-m, Vi-i, Ui+i, Ui+m}- 
Along the edges of the lattice each point has either 2 or 3 neighbours. A second order model 
has in addition a further sufficient statistic, 5*2 (|/), which counts the interaction between 
additional neighbours corresponding to the 4 nearest diagonally adjacent neighbours, with 
corrections to the neighbourhood on the border of the lattice. Thus the two models under 
consideration, mi and m2, are expressed as 



so that, when ^2 = 0, mi is nested within 7712. 

Here we simulated 20 reahsations from a first-order Ising model and a further 20 lattices 
realised from a second order Ising model, all defined on a 10 x 10 lattice. The aim of the 
experiment is to estimate which model, a first or a second order neighbourhood model, best 
explains each dataset, a posteriori. For this experiment the lattices are sufficiently small to 



7r(6'i|y,mi) oc 




T:{0i,92\y,m2) oc 



exp{6'igi(y) + 6'2g2(y)} 



n{9i,92\m2), 
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allow a very accurate estimate of the evidence in the following manner. For model 1712, the 
normalising constant z{6i, 62) can be calculated exactly for a grid of lvalues, which 

can then be plugged into the right hand side of: 

71(0] % Oo \y) OC 7TT 7TT 71(0] ,0o ), z = l, . . . .u. 

Numerically integrating the right hand side over the fine grid yields an accurate estimate of 
7r(?/|m2). This serves as a ground truth against which to compare with the corresponding 
estimates of the model evidence. The evidence 7r(?/|mi) may be similarly estimated. Exact 
calculation of z{0i,02) and the approach described above to get a precise estimate of the 



evidence relies on algorithms developed in (Friel and Rue 2007). A Gaussian prior for Oj, for 



j = 1,2, with zero mean and standard deviation 5 was chosen. 

In terms of implementation, the Bayes factor Population exchange algorithm was run, 
transitioning from mi to 1712, for 20, 000 iterations and 5 chains, where 200 auxiliary iter- 
ations are used to generate an approximate draw from the likelihood, and 200 additional 
draws used for the importance sampling estimates of ratio of the normalising constant at 
parameters between intermediate chains. The choice of the temperature ladder {ti, . . . ,tn} 
is an important aspect of this algorithm. Here we define ti = {i/n)^, for i = 0,. . . ,n, so 
that more temperatures are placed towards the prior than the posterior. Note that in com- 



mon with other tempering methods such as annealed important sampling (Neal 2001), the 



choice of temperature placement is important. An approach towards an automatic method 



for making this choice is presented in (Behrens et al 2011) in the context of the tempered 



transitions algorithm (Neal 1996). Finally, the closest 100 draws to the posterior mean of 
were used in (11) to estimate 7r(?/). 

The ABC model choice algorithm was run for 500, 000 draws from the joint model and 
parameter prior. At each iteration, 200 auxiliary draws were used to approximate a draw from 
the likelihood. This resulted in approximately equivalent computing time to the population 
exchange algorithm. In terms of choosing a tolerance level, we considered two choices, e 
corresponding to the 0.5% quantile of the distances between pseudo-data and the observed 
data (corresponding to 2,500 draws), and also a more restrictive choice of e corresponding 
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Figure 1: (a) Evidence exchange (b) ABC (tolerance equal to the 0.1% quantile) (c) ABC 
(tolerance equal to the 0.5% quantile) 

to the 0.1% quantile (resulting in 500 draws). 

In order to compare the performance of the various algorithm, in Figure [T] we present a 
scatterplot of the true value of 7r(mi|y) against its estimated value 7i{mi\y). We see that the 
evidence estimate based on the population exchange algorithm gives improved performance 
to the ABC model choice algorithm with tolerance level e corresponding to the 0.1% quantile. 
However when the tolerance in the ABC algorithm is set equal to the 0.5% quantile, there 
is a much a large discrepancy between the true value of 7i{mi\y) and the estimated 7i{mi\y). 

Finally, in Figure [2] we display a boxplot of values of the ratio of the true Bayes factor 
to the estimated Bayes factor for each of the 40 datasets. Again this suggests that the 
population exchange algorithm gives improved estimates of the Bayes factors compared to 
the ABC model choice algorithm with tolerance level equal to the 0.1% quantile. While the 
Bayes factor estimates based on using a tolerance level equal to a tolerance level of 0.5% 
does not lead to efficient estimation of the Bayes factor. 



6.2 Exponential random graph model 



Here we explore how the methodology may be applied to the exponential random graph 



(ERG) model (or p*) (Robins et al 2007) which is widely used in social network analysis. 
The ERG model is defined on a random adjacency matrix 1^ of a graph on n nodes (or 
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Figure 2: Ratio of the true Bayes factor to the estimated Bayes factor for (a) Evidence 
exchange (b) ABC (tolerance set equal to the 0.1% quantile) (c) ABC (tolerance set equal 
to the 0.5% quantile) 

actors) and a set of edges (dyadic relationships) {Yij : i = l,...,n;j = l,...,n} where 
Yij = 1 if the pair is connected by an edge, and Yij — otherwise. An edge connecting 
a node to itself is not permitted so Yu — 0. The dyadic variables maybe be undirected, 
whereby Y^j = Yji for each pair (i,j), or directed, whereby a directed edge from node i to 
node j is not necessarily reciprocated. 

The likelihood of an observed network y is modelled in terms of a collection of sufficient 
statistics {si{y), . . . , Sfe(y)}, each with corresponding parameter vector 9 — {9i, . . . , 9k}, 



For example, typical statistics include si{y) = Vij and S2{y) = Z]j<j<ifc yikVjk which are, 
respectively, the observed number of edges and two-stars, that is, the number of configura- 
tions of pairs of edges which share a common node. It is also possible to consider statistics 
which count the number of configuration of k edges which share a node in common, for 
k > 2. A practical problem facing the practitioner is to understand which sufficient statis- 
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Figure 3: Gamaneg graph 
tics to include in the model. We view this is as a Bayesian model selection problem, and 



apply the developed methodology to the following network. The Gamaneg network (Read 



1954), displayed in Figure [3j consists of 16 sub-tribes of the Eastern central highlands of New 
Guinea. Here an edge represents an antagonistic relationship between two actors. 

Here we consider two competing (nested) models. Model mi corresponds to the posterior 
distribution 

TT(9i\y,mi) oc exp {9iSi{y)} n{9i), 

where Si {y) counts the total number of observed edge. Model m2 corresponds to the posterior 
distribution 

Tr{9i,92\y,m2) oc exp{9iSi{y) + 6'2S2(?/))7r(6'i, 6^2), 

where, in addition to the Si{y) statistic, the two-star statistic S2{y), as defined above, is 
also included. For both models the prior distributions 9i and 92 are both set to be A^(0,5^) 
distributions. 



The population exchange algorithm outlined in Section 4^ was implemented with 10 
chains, for 10, 000 overall iterations where 1, 000 auxiliary iterations were used to generate 
an approximate draw of a network from the likelihood. A further 200 draws were used for the 
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importance sampling estimate of the ratio of the normahsing constant between successive 
temperatures. As before, a temperature schedule ti = (i/n)^, for i = 0, . . . ,n, was used. The 
algorithm was implemented in C and took approximately 20 minutes on 3.33Ghz processor 
with 4Gb of memory. The algorithm yielded an estimate -B-F12 = 37.499, suggesting that 
there is considerably stronger support for model mi than for model m2- 

An useful by-product of the population exchange algorithm is that draws at the first and 
last temperatures, correspond to draws from model mi and m2, respectively. The estimated 
posterior densities of parameters from each model is displayed in Figure |4j 




-2.0 -1.5 -1.0 -0.5 -3 -2 -1 1 2 -0.6 -0.4 -0.2 0.0 0.2 

01 I y , mi 01 I y . rn2 02 I y . rn2 



(a) (b) (c) 

Figure 4: Posterior density estimates for gamaneg network: (a) 7T{6i\y,mi); (b) 7r(0i|?/, 7712); 
(c) 7r(6'2|i/,m2). 

The posterior density for parameter 62 in model m2, in Figure |4], illustrates that values 
of 62 close to have strong support, a posteriori, which is consistent with most posterior 
support given to model mi. 

7 Summary 

Calculating the statistical evidence for Gibbs random fields is a challenging task. This article 
has proposed a novel approach to this end. The experiments carried out here suggest that 
this method has some promise, and the next stage will be to explore how this methodol- 
ogy extends to more complicated situations, such as hidden Markov random fields and to 
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larger exponential random graph models, both in the size of the graph, and the number of 
parameters in the model. 
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Supplemental Materials 

C code: The supplemental files for this article include C programs which can be used to 
replicate the Ising model study and exponential random graph example in Section |6] of 
this article. Please see the file README.txt contained within the accompanying tar file 
for more details. 
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